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Abstract. For optimization problems with nonlinear constraints, linearly constrained Lagran- 
gian (LCL) methods sequentially minimize a Lagrangian function subject to linearized constraints. 
These methods converge rapidly near a solution but may not be reliable from arbitrary starting 
points. The well known example MINOS has proven effective on many large problems. Its success 
motivates us to propose a globally convergent variant. Our stabilized LCL method possesses two 
important properties: the subproblems are always feasible, and they may be solved inexactly. These 
features are present in MINOS only as heuristics. 

The new algorithm has been implemented in Matlab, with the option to use either the MINOS 
or SNOPT Fortran codes to solve the linearly constrained subproblems. Only first derivatives are 
required. We present numerical results on a nonlinear subset of the COPS, CUTE, and HS test- 
problem sets, which include many large examples. The results demonstrate the robustness and 
efficiency of the stabilized LCL procedure. 
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1. Introduction. For optimization problems with nonlinear constraints, linearly 
constrained Lagrangian (LCL) methods sequentially minimize a Lagrangian function 
subject to linearized constraints. As currently defined, these methods converge rapidly 
near a solution but may not be reliable from arbitrary starting points. The well known 
example MINOS [?] has proven effective on many large and small problems, especially 
within the GAMS [?] and AMPL [?] environments, and is widely used in industry and 
academia. Its success motivates us to propose a globally convergent variant of the 
LCL method. 

Our stabilized LCL algorithm solves a sequence of linearly constrained subprob- 
lems. Each subproblem minimizes an augmented Lagrangian function within a linear 
manifold that describes a current approximation to the nonlinear constraints. This 
manifold is nominally a linearization of the constraint space but may be a relaxed 
(i.e., larger) space at any stage, particularly during early iterations. Few conditions 
are imposed on the nature of the subproblem solutions; consequently, the subproblems 
may be solved with any of a variety of optimization routines for linearly constrained 
problems, providing much flexibility. 

The stabilized LCL method possesses two important properties: the subproblems 
are always feasible, and they may be solved inexactly. These features are present 
in MINOS only as heuristics. The method may be regarded as a generalization of 
sequential augmented Lagrangian methods (see, for example, [?, ?, ?]). The theory 
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we develop provides a framework that unifies Robinson's LCL method [?] with the 
bound-constrained Lagrangian (BCL) method used, for example, by LANCELOT [?]. 
In the context of our theory, the proposed algorithm is actually a continuum of meth- 
ods, with LCL and BCL methods at opposite ends of a spectrum. The stabilized 
LCL algorithm exploits this connection between BCL and LCL methods, preserving 
the fast local convergence properties of LCL methods while inheriting the global con- 
vergence properties of BCL methods. This connection is explored in more detail by 
Fricdlandcr [?]. 

Our focus is on large-scale problems. We implemented the stabilized LCL method 
using the reduced-gradient part of MINOS [?] and the sequential quadratic program- 
ming code SNOPT [?] to solve the linearly constrained subproblems. These solvers 
are most efficient on problems with few degrees of freedom. Also, they use only first 
derivatives, and consequently our implementation requires only first derivatives. We 
discuss how the stabilized LCL method might be used with first- or second-derivative 
linearly constrained solvers. 

1.1. The optimization problem. The proposed method solves nonlinearly 
constrained optimization problems of the form 



where / : R™ i— > 1R is a linear or nonlinear objective function, c : R™ R m is a 
vector of nonlinear constraint functions, A is a matrix, and I and u are vectors of 
bounds. We assume that A and the derivatives of c are sparse and that the problem 
(NP) is feasible. We recognize that not all optimization problems are feasible. This 
possibility is addressed in §3.3, where we explain how the proposed algorithm reveals 
an infeasible optimization problem and discuss properties of the points to which it 
converges. 

One of the strengths of our method is that it does not explicitly require second- 
order information. However, the fast convergence rate of the algorithm relies on 
sufficient smoothness of the nonlinear functions, indicated by the existence of second 
derivatives. We make that assumption: 

Assumption 1.1. The functions f and c are twice continuously differentiable on 
an open neighborhood containing the region 



Note that second derivatives could be used if they were available, thus accelerating 
the solutions of the subproblems and changing the properties of the solutions obtained 
by the algorithm. We discuss this further in §3.4. 

1.2. The LCL approach. The acronym LCL is new. Methods belonging to 
this class typically have been referred to in the optimization literature as sequen- 
tial linearized constraint (SLC) methods (cf. [?, ?]). The term SLC was chosen for 
compatibility with the terms sequential quadratic programming (SQP) and sequential 
linear programming (SLP). Those methods also sequentially linearize the constraints. 
The term linearly constrained Lagrangian, however, emphasizes that the Lagrangian 



(NP) 



minimize fix) 



subject to I < c(x) < u, 
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itself, and not an approximation, is used in the subproblems. Moreover, there is a 
useful relationship (which we exploit) between LCL and BCL methods, and this is 
hinted at by the nomenclature. 

The first LCL methods were proposed independently in 1972. Robinson [?] and 
Rosen and Kreuser [?] describe similar algorithms based on minimizing a sequence of 
Lagrangian functions subject to linearized constraints. Robinson is able to prove that, 
under suitable conditions, the sequence of subproblem solutions converges quadrat- 
ically to a solution of (NP). A strength of this method is that efficient large-scale 
methods exist for the solution of the linearly constrained subproblems formed at each 
iteration. Any suitable example of these subproblem solvers may be called as a black 
box. 

1.3. Other work on stabilizing LCL methods. Other approaches to stabi- 
lizing LCL algorithms include two-phase methods proposed by Rosen [?] and Van Der 
Hoek [?]. In these approaches, a Phase 1 problem is formed by moving the nonlinear 
constraints into the objective by means of a quadratic penalty function. The solution 
of the Phase 1 problem is used to initialize Robinson's method (Phase 2). With a suf- 
ficiently large penalty parameter, the Phase 1 solution will yield a starting point that 
allows Robinson's method to converge quickly to a solution. These two-phase meth- 
ods choose the penalty parameter arbitrarily, however, and do not deal methodically 
with infcasible linearizations. 

In 1981, Best et al. [?] describe a variant of the two-phase method whereby the 
Phase 1 penalty parameter is gradually increased by repeated return to the Phase 1 
problem if the Phase 2 iterations are not converging. This two-phase method differs 
further from Rosen's and Van Der Hoek's methods in that the Phase 2 iterations 
involve only those equality constraints identified as active by the Phase 1 problem. The 
authors are able to retain local quadratic convergence of the Phase 2 LCL iterations 
while proving global convergence to a stationary point. A drawback of their method is 
that it requires a fourth-order penalty term to ensure continuous second derivatives of 
the penalty objective. This requirement may introduce significant numerical difficulty 
for the solution of the Phase 1 problem (though probably a quadratic-penalty term 
would suffice in practice). 

Both two-phase methods share the disadvantage that the Phase 1 penalty prob- 
lems need to be optimized over a larger subspace than the subsequent LCL phase. 
We seek a method that retains the linearized constraints as part of the subproblem, 
in order to keep the number of degrees of freedom small; and, as in Robinson's 1972 
method, we allow the subproblem to determine the final set of active constraints. 

1.4. The generic problem. For the theoretical development of a stabilized 
LCL method, we consider a simplified, generic formulation of (NP) and take the 
optimization problem to be 




where c : R™ R m . Section 4 returns to the formulation (NP) in its discussion of 
the implementation of the stabilized LCL method. 

We define the augmented Lagrangian function corresponding to (GNP) as 

C(x,y,p) = f(x) - y T c(x) + \p\W)\\l ( L1 ) 
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where x, the m-vector y, and the scalar p are independent variables. Let g(x) denote 
the gradient of the objective function f(x), and let J(x) denote the Jacobian matrix 
of the constraint vector c(x). Denote by H(x) and Hi(x) the Hessian matrices of f(x) 
and [c(x)]i, respectively, where [■]* refers to the ith component of a vector. Define 

y(x,y,p) = y- pc(x). (1.2) 

The derivatives of C with respect to x may be written as follows: 

V x £(x,y,p) = g(x) - J(x) T y(x,y, p) (1.3) 
V 2 xx C(x, y, p) = H{x) - Yfflx, y, p^H^x) + P J(x) T J(x). (1.4) 

i=l 

We assume that problem (GNP) is feasible and has at least one point (a;*, y*, z*) 
that satisfies the first-order Karush-Kuhn- Tucker (KKT) optimality conditions. 

Definition 1.2 (First-Order Optimality Conditions). A triple (x*,y*,z*) is a 
first-order KKT point for (GNP) if for any p > all of the following hold: 

c(x*) = (1.5a) 

V x £(x*,y*,p) = z* (1.5b) 

mm(x*,z*) = 0. (1.5c) 

Note that (1.5c) implies 

x» > (1.6a) 

z» > 0, (1.6b) 

so that and z* must be primal and dual feasible, respectively. 

Let 77* > and > be specified as primal and dual convergence tolerances. We 
regard the point (x,y, z) to be an acceptable solution of (GNP) if it satisfies (1.5) to 
within these tolerances. Specifically, we identify (x, y, z) as an approximate solution 
of (GNP) if 

||c(x)||<?7* (1.7a) 
V x C(x,y,p)=z (1.7b) 
||min(x, z)\\oo < w*. (l- 7c ) 

Note that (1.7c) relaxes the nonnegativity conditions (1.6) by the same tolerance 
In practice, we might choose to relax (1.6a) to x > — 5*e, for some S* > 0. However, 
we ignore this detail for now. 

For theoretical purposes, we assume that strict complementarity and the second- 
order sufficiency conditions hold at each (x*,y»,z»). We define these conditions as 
follows. 

Definition 1.3 (Strict Complementarity). The point (x», y*, z„) satisfies strict 
complementarity if it satisfies (1.5) 8Bdmax(i»,z,) > 0. 

Definition 1.4 (Second-Order Sufficiency). The point (x*,y*,z*) satisfies the 
second-order sufficiency conditions for (GNP) if it satisfies (1.5) and strict comple- 
mentarity and if for any p > 0, 

p T V 2 xx £(x r ,y*,p)p>0 (1.8) 
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for all p ^ satisfying 

J{x*)p = and [p]j — for all j such that [x*]j = (1.9) 

(and [z*]j > 0). 

Assumption 1.5. The point (x*,y*,,2*) satisfies the second-order sufficiency 
conditions for (GNP). 

1.5. The canonical LCL method. The software package MINOS solves the 
nonlinearly constrained problem (GNP) by minimizing a sequence of augmented La- 
grangian functions subject to linearized constraints. Define the constraint lineariza- 
tion at the point x k as 

Cfc(x) = c(x k ) + J(x k )(x - x k ). 

Algorithm 1 outlines what we regard to be a canonical LCL method. It forms the basis 
for the MINOS algorithm and is based on solving the linearly constrained subproblems 



(LC fc ) minimize C(x,y k ,p k ) 

X 

subject to c k (x) = 
x > 0, 



which are parameterized by the latest estimates x k and y k , and a fixed penalty pa- 
rameter pk = p (which may be set to zero) . The linear constraints c k (x) — are the 
linearization of c at the point x k . 

Empirically, a positive penalty parameter p has proven a helpful addition to 
Robinson's method, but for other problems it has been ineffective. A theoretical 
understanding of when and how to modify the penalty term has been lacking. 

1.6. Notation. The symbol x* is used in two senses: as a limit point of the 
sequence {x k } 7 and as the primal solution of (GNP). We distinguish between the two 
cases when the context is not clear. Denote by g(x) the vector of components of g(x) 
corresponding to inactive bounds at x*, so that if T = {i € l,...,n | [x*], > 0}, 
g(x) = [g(x)]x (where is a shorthand notation for a subvector formed from the 
indices in X). Similarly, let J(x) denote the corresponding columns of the Jacobian 
matrix. 

Unless otherwise specified, the function ||x|| represents the Euclidean norm of the 
vector x. When the arguments are vectors, define the function min(-, •) component- 
wise. The following notation is used throughout: 

(x,y, z) primal variables, dual variables, and reduced costs for (GNP), 

(x*, y», z*) optimal variables for (GNP), 

(x k ,y k ,z k ) the fcth estimate of (x* , y* , z* ) , 

(x* k , Ayl, zl) solution of the fcth subproblem, 

y k y k + Ay k ; an updated multiplier estimate, 

./fc, Cfc, Jfc functions and gradients evaluated at x k , 

/*, g*, c», J* functions and gradients evaluated at x». 
The augmented Lagrangian function is particularly important for our analysis. We 
often use the shorthand notation 



C k (x) ee C(x,y k ,p k ) = /(x) - vlc{x) + ip fe ||c(x)|| 2 , 



(1.10) 
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Algorithm 1: Canonical LCL 

Input: x ,yo,z 
Output: x*, y*, z* 

[Initialize parameters] 

Set the penalty parameter p > 0. Set positive convergence tolerances 
_ w*, /?* < 1; 
k <- 0; 

converged <— false; 
repeat 

[Solve the LC subproblem] 

Solve (LCfc) to obtain a point (x k , Ay k , z k ). If there is more than one 
such point, choose (x k , Ay k , z k ) closest in norm to (xk, 0, Zk); 
_ Vl^Vk + Ayl; 

[Update solution estimates] 

|_ x k+1 <- x* k , y k+1 <- y* k , z k+1 <- z* k ; 

[Test convergence] 

|_ if (xk+i, 2/fe+i, Zk+i) satisfies (1.7) then converged <— true; 

p k <- p; [keep p fc fixed] 
fc <- fc+ 1; 
until converged; 

x* < x/j, < y k , z % < z k \ 
return x»,y*,z»; 



when y k and are fixed. The algorithms we discuss are structured around major 
and minor iterations. Each major iteration solves a subproblem and generates an 
element of the sequence {(x k , y kl z k )}. Under certain (desirable) circumstances, this 
sequence converges to a solution (x», y*, z*). For each major iteration k, there is a 
corresponding set of minor iterations converging to [x k , Ay k , z k ), the solution of the 
current subproblem. In our development and analysis of a stabilized LCL method, 
we are primarily concerned with the "outer" -level algorithm. Unless stated otherwise, 
"iterations" refers to major iterations. 



2. An Elastic LC Subproblem. The original LCL method introduced by 
Robinson [?] sets p = in Algorithm 1. A positive penalty parameter could be 
used (as it is in MINOS [?]) and may help convergence from difficult starting points. 

We recognize two particular causes of failure for the LCL method: 

• The linearized constraints may be infeasible, so that the LCL iterations are 
not denned; 

• A near-singular Jacobian J k (we only assume nonsingularity of the Jacobian 
at limit points — cf. Assumption 3.2) might lead to an arbitrarily large value 
of — %k\\ regardless of the values of y k and p k in the subproblem objective. 

To remedy both deficiencies we modify the linearized constraints used by the LCL 
method, allowing some degree of flexibility in their satisfaction. We introduce a set of 
nonnegative elastic variables, v and w, into the constraints and introduce a penalty 
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on these variables into the subproblcm objective. Hence, we define the subproblem as 



(ELC fc ) 


minimize 


C k {x) 


+ <r k e T (v + w) 


x,v,w 




subject to 


Ck(x) 


+ v — w = 








x, v, w > 0, 











where e is a vector of ones. This elastic subproblem is always feasible. Its solution 
yields a 5-tuple (x* k , Ay%, z k , v k , w%) that satisfies the first-order KKT conditions 

v,w>0 (2.1a) 
Cfc(x) +v - w = (2.1b) 
VC k (x)- JlAy = z (2.1c) 
min(a;,z)=0 (2. Id) 

Halloo < fffc- (2-lc) 

Note that V L k {x) involves y k and p k . 

The term ake T (v + w) is the ^-penalty function, and together with the non- 
negativity constraints v, w > it is equivalent to a penalty on the one-norm of (v — w). 
We find later that the bound (2.1e) is crucial for the global convergence analysis of 
the proposed method. 

We note that the elastic LC subproblem can be equivalently stated as 



(ELC' fc ) 


minimize 

X 


C k (x) +CTfe||c fe (a:)||i 




subject to 


x > 0, 



with solution (x* k , z£). This immediately reveals the stabilized LCL method's intimate 
connection with both the augmented Lagrangian function and the BCL method. Far 
from a solution, the ^i-penalty term <7fc||cfc(a;)||i gives the method an opportunity to 
deviate from the constraint linearizations. Near a solution, it keeps the iterates close 
to the linearizations. For values of a k over a threshold value, the linearized constraints 
are satisfied exactly, as required by the LCL method. 

2.1. The ^i-penalty function. For any given subproblem of the stabilized LCL 
method, the penalty term a k e T (y + w) may or may not equal zero, indicating that 
the linearized constraints may not always be satisfied. In contrast, the MINOS or the 
canonical LCL subproblems must always satisfy the linearized constraints. Thus, the 
set of active linearized constraints of the stabilized LCL subproblcm is always a sub- 
set (though not necessarily strict) of the the canonical LCL subproblcm. Fletcher [?] 
makes the same observation in connection with his S£iQP method. The global conver- 
gence properties of the stabilized LCL method do not require independent constraint 
gradients or bounded multipliers for each subproblcm (these are required only at limit 
points of the sequence generated by the algorithm). 

Recovering the BCL subproblem. Set a k = 0. Then (ELCfc) and (ELC' fc ) 
reduce to the equivalent bound-constrained minimization problem 



(BC fc ) 


minimize 

X 


C k (x) 




subject to 


x > 0, 
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where the bounds on the variables v and w have been eliminated because they no 
longer appear in the objective. The subproblcm (BCfc) is used by the BCL method 
(see, for example, Hestenes [?], Powell [?], Bertsckas [?], and Conn et al. [?, ?]). 

Recovering the LCL subproblem. The ^-penalty function is exact. If the 
linearization is feasible and at is larger than a certain threshold, v and w will be zero 
and the minimizcrs of the elastic problem (ELCfc) will coincide with the minimizcrs 
of the inelastic problem (LCfe). Exact penalty functions have been studied by [?, ?, ?] 
among others. See the book by Conn et al. [?] for a more recent discussion. 

We are particularly interested in this feature when the iterates generated by 
the stabilized LCL algorithm are approaching a solution z»). Recovering the 

canonical LCL subproblem as the iterates approach a solution ensures that the sta- 
bilized LCL method inherits LCL's fast local convergence properties. 

To prove that the condition 

UA/fclloo < <7fc (2.2) 

is sufficient to force the elastic variables to zero, we require two conditions: (i) the 
inelastic subproblem (LCfc) must satisfy the second-order sufficiency conditions at a 
solution x* k \ and (ii) x* k must be a regular point. Assumptions 1.5 and 3.2 guarantee 
that both these conditions are met. For x* k close to x», Assumption 1.5 guarantees that 
(LCfc) satisfies the second-order conditions. Assumption 3.2 guarantees the regularity 
of x* k when it is near x». Lemma 2.1 establishes the threshold value of <7fc. 

Lemma 2.1. Suppose that (x* k , Ay k , z k ) satisfies the second-order sufficiency con- 
ditions for (LCfc). Then if (2.2) holds, {x* k ,Ay* k ,zD also solves (ELCfc). 

Proof. See Luenberger [?, p. 389]. □ 

2.2. Early termination of the subproblems. Poor values of a^, y^, or pk 

may imply subproblems whose accurate solutions are far from a solution of (GNP). 
We therefore terminate subproblems early by relaxing (2. Id) and (2.1e) by an amount 
Wfe. However, we enforce the nonnegativity condition on x (implied by (2. Id)): 



x, v, w > (2.3a) 

c k (x)+v-w = Q (2.3b) 

WC k (x) - JlAy = z (2.3c) 

||min(x, z)!!^ < uj k (2.3d) 

HA/lloo <o-k+u; k . (2.3c) 



Each subproblcm is required to return a solution satisfying the linear and non- 
negativity constraints and, as discussed in connection with (1.6a), in practice (2.3a) 
and/or (2.3b) would be relaxed by a fixed tolerance 5. 

3. The Stabilized LCL Algorithm. Algorithm 2 outlines the stabilized LCL 
method. Its structure closely parallels the BCL algorithm described in [?]. Based on 
the current primal infcasibility, each iteration of the algorithm is regarded as cither 
"successful" or "unsuccessful." In the "successful" case, the solution estimates are 
updated by using information from the current subproblem solution. If the iteration 
is "unsuccessful," the subproblcm solutions are discarded, the current solution esti- 
mates are held fixed, and the penalty parameter pk is increased in an effort to reduce 
the primal infcasibility in the next iteration. In order for the linearized constraints 
not to continue interfering with the penalty parameter's ability to reduce the primal 
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Algorithm 2: Stabilized LCL. 



Input: x ,y ,z 
Output: x*,y*,z* 

[Initialize parameters] 

Set a > a > 0. Set constants rp,r CT > 1. Set the initial penalty parameters 
po > 1 and (To 3> 1. Set positive convergence tolerances ui t ,rj, <C 1 and initial 
tolerances uio > lo, and rjo > rj r . Set constants a,/3 > with a < 1; 
k <- 0; 

converged <— false; 
repeat 

Choose dj fe > a;* such that lim^oo o; fc = w*; 
[Solve the LC subproblem] 

Solve (ELCfc) to obtain a point (x* k , Ay k , z k ) satisfying (2.3). If there is 
more than one such point, compute the one closest in norm to (xk,0,Zk); 

if ||c(a;£)|| < max(?7*,?7fc) then 

[Update solution estimates] 

x k+1 <- x* k ; 

Vk+i <- Vt ~ Pkc(xl) (=y{x* k ,yl,p k )); [or y k +i <- y* k ] 

z k +i <- 4; [or 2 fc+ i <- - Jk+m+i] 

[Update penalty parameter and elastic weight] 

pk+i <- pfc; [keep p fc ] 

a fe+ i <- ma,x{a,mm(\\Ayl\\ x ,'a)}; [reset cr fe ] 

[Test convergence] 

^_ if (a;fe+i, J/fc+i, 2fe+i) satisfies (1.7) then converged <— true; 
r/fc+i <- Vk/Pk+u [decrease rj k ] 

else 

[Keep solution estimates] 

l_ Xk+i <— a;fc; j/fc+i <— s/fc ; 2fe+i <— 2*;; 
[Update penalty parameter and elastic weight] 

Pk+i *- T p p k ; [increase p k ] 

o-fe+i <— o k jT a \ [decrease cr fc ] 

??fc+i <— Vo/Pk+u [ ma y increase or decrease ??fc] 

+ i; 
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k^k 
until converged; 

x* < x k \ y* < yk\ z* 
return x„, y*, z*; 



Zk\ 



infcasibility, the algorithm relaxes the linearizations by reducing the clastic penalty 
parameter crfe. 

The two salient features of this algorithm are that it is globally convergent and 
that it is asymptotically equivalent to the canonical LCL method. In §3.1 we demon- 
strate the global convergence properties of the algorithm by proving results analogous 
to Lemma 4.3 and Theorem 4.4 in [?]. In §3.2 we demonstrate that the algorithm 
eventually reduces to the canonical LCL method and hence inherits that method's 
asymptotic convergence properties. 
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3.1. Global convergence properties. We make the following assumptions. 
Assumption 3.1. The sequence of iterates {x^.} lies in the closed and bounded 
setBcW 1 . 

Assumption 3.2. The matrix J(x*) has full row rank at every limit point x* of 
the sequence {x k } . 

The first assumption guarantees that any sequence of iterates generated by the 
algorithm always has some convergent subsequence. The second assumption is com- 
monly known as the linear independence constraint qualification (LICQ) (see, for 
example, Mangasarian [?], or for a more recent reference, Nocedal and Wright [?]). 

Let x* be any limit point of the sequence {x* k }. At all points x for which J{x) 
has full row rank we define the least-squares multiplier estimate, y(x), as the solution 
of the linear least-squares problem 

minimize \\g(x) — J(x) T y\\ 2 . (3.1) 

v 

Note that the definitions of <?, J, and hence y require a priori knowledge of the 
bounds active at a;*. We emphasize that y is used only as an analytical device and 
its computation is never required. Assumption 3.2 guarantees the uniqueness of y at 
every limit point of the sequence {x* k }. 

3.1.1. Convergence of LC subproblem solutions. In this section we prove 
that the sequence of LC subproblem solutions generated by Algorithm 2 converges to 
a KKT point of (GNP). 

We need the following lemma to bound the errors in the least-squares multiplier 
estimates relative to the error in Xk- The lemma simply demonstrates that y(x) is 
Lipschitz continuous in a neighborhood of x* . 

Lemma 3.3. Let {xk}, k € JC be a subsequence converging to a;* and suppose 
that Assumptions 1.1 and 3.2 hold. Then there exists a positive constant a such that 
\\y{ x k) — y{x*)\\ < a\\xk — a;* || for all k e JC sufficiently large. 

Proof. See Lemmas 2.1 and 4.4 of [?]. □ 

To prove the global convergence properties of Algorithm 2, we first describe the 
properties of any limit point that the algorithm generates. We are not claiming (yet!) 
that the algorithm is globally convergent, only that if it does converge, then the set of 
limit points generated must satisfy some desirable properties. The following lemma 
is adapted from Lemma 4.4 of [?]. 

Lemma 3.4. Let {cut} and {pk} be sequences of positive scalars, where ujk — * 0. 
Let {xk} be any sequence of n-vectors and {yk} be any sequence of m-vectors. Let 
{(x* k ,Ay k ,z k )} be a sequence of vectors satisfying (2.3a), (2.3c), and (2.3d). Let x* be 
any limit point of the sequence {x k }, and let JC be the infinite set of indices associated 
with that convergent subsequence. Suppose that Assumptions 1.1, 3.1, and 3.2 hold. 
Set y%=yk + Ay%, yk = y(x* k ,yl, p k ), and = y(x*). The following properties then 
hold: 

1. There are positive constants ct\, a-i, and M such that 

\\Vk - y*\\ <Pi = a i^k + M\\x* k - x k \\ \\yl - y k \\ + a 2 \\x* k - x*\\, (3.2) 
Pk\\c{x%)\\ < (3 2 = Pi + \\yl - y k \\ + \\Vk- V*l (3.3) 

for all k e JC sufficiently large. 

2. As k G JC gets large, if \\yl — Vk\\ — * 0, or if \\y k — yk\\ is bounded and 
\\x* k — x k \\ — ► 0, then 

y k —>y* and z k — > z* = V x C{x^ , y*, 0). 
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3. If, in addition, c* = 0, then (x*,y*,z*) is a first- order KK point for (GNP). 
Proof. From the definition of y{x* k ), the least-squares multiplier estimates, 

l|y(4) - foil = UJKWxtFyiJixmixt) - fo|| 

= \\{J{xl)J{xl) T )-'j{x%){g{xl) - J{xl) T y k )\\ (3.4) 
< \\(JK)J(xlf)-'j(xt)\\ ■ \\g(x* k ) J(x* k ) T y k \\- 

By assumption, J(x») has full row rank. Continuity of J then implies that 

{J{xl)J{xl) T )^J{xD 
exists for all k £ K, large enough. Then there exists a positive scalar ct\ such that 

\\{J(xi)j{xi) T )^j{xl)\\<^=, (3.5) 

where n is the dimension of the vector x. Substituting (3.5) into (3.4), 

-y fc || < ^\\g(x* k )-J(x* k ) T y k \\. (3.6) 

We now show that ||g(a;Jl) — J{x* k ) T y k \\ is bounded. By hypothesis, {x* kl y k ,z k ) satis- 
fies (2.3c). Together with (1.3), 

z* k = VC k {x%) - JlAy* k 

= 9(x*k) - J{x* k ) T {y k - p k c{x* k )) - J k Ay* k 

t (3-7) 
= .9(4) - J( x *k) T (yk + Ay* k - Pkc{x* k )) + (J(x* k ) - J k ) Ay* k 

= a{x* k ) - J(x* k ) T y k + (J(x* k ) - J k ) T {y* k - y k ), 

where y* k d =y k + Ay* k and y k d =y(x* k ,yl, p k ) = y* k - p k c(x* k ). For k £ JC large enough, 
x* k is sufficiently close to x* so that 

||[4]x||<||min(4,4)||, (3.8) 

where I is the index set of inactive bounds at x* kl as defined in §1.6. Because x* k and 
z k both satisfy (2.3d), (3.8) implies that 

||[*iE]x||<vW. (3.9) 

Combining (3.7) and (3.9), 

\\g{xt) - J(xl) T y k + (j(x* k ) - J k ) T {y* k - y k )\\ < ^ u> k . (3.10) 
But, from the triangle and Cauchy- Schwartz inequalities, we have 

\\g(x* k ) - J{xl) T y k \\ < \\g(x* k ) J(x* k ) T y k + (j(x%) J k ) T (y* k y k )\\ 

+ \\J(xt)-J k \\\\y* k -y k \\. (3.11) 

Also, the continuity of J implies that there exists a positive constant M such that 
\\J(x* k ) - Jfell < M^\\x* k - x k \\. Together, (3.11) and (3.10) imply that 

\\g{x* k ) - J{x* k fy k \\ <Vn~u k + M^\\x* k - x k \\\\y* k - y k \\, (3.12) 
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and so we have derived a bound on ||.g(x£) — J(x£) T y fe ||, as required. 
We now derive (3.2). From the triangle inequality, 

\\yk - y*\\ < \\y( x t) - yd + l|y(4) - y*\\- ( 3 - 13 ) 

Using inequality (3.12) in (3.6), we deduce that 

\\y(x* k )-y k \\ < a 1( j k + M\\x* k - x k \\\\y* k - y k \\, (3.14) 
and Lemma 3.3 implies that there exists a constant ai such that 

\\y( x k) ~y*\\ < "2114 -a;* ||, (3.15) 

for all k e fC large enough (recall that = y(x*)). Substituting (3.14) and (3.15) 
into (3.13), we obtain \\y k — y*|| < (3\ as stated in (3.2). 

We now prove (3.3). From the definition of y k , rearranging terms yields 

Pkc(x* k )=y* k -y k . (3.16) 

Taking norms of both sides of (3.16) and using (3.2) yields 

Pfe||c(4)ll = hk - Vk\\ 

= hk - y* + y* - yk + y* k - yd 
< \\Vk -y*\\ + hk -y*\\ + hi - yd 
<Pi + \\yk-y*\\ + \\yt-yk\\ 

and so Part 1 of Lemma 3.4 is proved. 

Now suppose that \\y k — y k \\ — > as k € IC goes to infinity. Because {x k } and 
{x k } are in the compact set B, \\x k — x k \\ is bounded. We conclude from (3.2) that 
Vk — >• y* as k e JC goes to infinity. We also conclude from the continuity of J that 
|| J(x k ) — J k \\ is bounded, so that 

lim||(JK)- Jkf(y* k -yk)\\ =0. (3.17) 

On the other hand, suppose that || y k — y k \\ is uniformly bounded and that limfe e ^ \\x* k — 
x k \\ = 0. We then conclude from (3.2) that y k — > y* as k e JC goes to infinity and (3.17) 
holds. Because \\vcL keK {x* k ,y k ) = (x*,y*), 

9{x* k ) ~ J(x* k ) T yk -» .9* - Jjy*, 

and so (3.7) and (3.17) together imply that 

z k — > z» = Vcc^^*,?/*^) (3.18) 

as fc e /C goes to infinity. Thus we have proved Part 2 of Lemma 3.4. 
Now suppose that 

c*=0. (3.19) 
Each xj; and zj* satisfies (2.3d). Then X\m k ^{x* k , z k ) = (x*,z»), and io k — > implies 

min(a;*, z») = 0. (3.20) 
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Therefore, (3.18)-(3.20) imply that (x„, y*, z*) satisfies (1.5) and so it is a first-order 
KKT point for (GNP). Part 3 is thus proved, and the proof is complete. □ 

The conclusions of Lemma 3.4 pertain to any sequence {{x k , Ay k , z k )} satisfying 
the approximate first-order conditions (2.3). Algorithm 2 generates such a sequence 
and also generates auxiliary sequences of scalars {wfe}, {pk}, and {(Jk\ in such a way as 
to guarantee that the hypotheses of Lemma 3.4 hold. We demonstrate in Theorem 3.6 
that the condition of Part 3 of Lemma 3.4 holds. Therefore, every limit point of the 
sequence {{x* kl yu, z k )} is a first-order KKT point for (GNP). 

3.1.2. Convergence of ||2/fe || / Pfc- Before laying out the global convergence prop- 
erties of the stabilized LCL method, we need to show that if pk — > oo then the quotient 
\\Vk\\l Pk converges to 0. This property is required (and used by Conn et al. [?, ?]) in 
lieu of assuming that \\yk\\ remains bounded. 

Lemma 3.5. Suppose that pt — > oo as k increases when Algorithm 2 is executed. 
Then \\yk\\/pk -> 0. 

Proof. The multiplier update of Algorithm 2 (Step 5) is yk+i <— y% — Pkc{x k ). 
If this is replaced by yk+i <— yk — Pkc(x k ), Lemma 4.1 of Conn et al. [?] applies. 
The construction of the forcing sequence r\k (Steps 8 and 11) is therefore sufficient 
to guarantee that \\yk\\/pk — * 0. Note that the norm of the difference between the 
two updates is given by \\y k — yk\\ = \\Ay k \\. This difference is bounded because Ay k 
satisfies (2.3e). Therefore, \\yk\\l Pk ^ in Algorithm 2 as p k — > oo. □ 

3.1.3. Main convergence result. With Lemmas 3.4-3.5 in hand, we are now 
able to prove global convergence of the stabilized LCL method. 

Theorem 3.6. Let {(x k ,y k , z k )} be the sequence of vectors generated by Algo- 
rithm 2 with tolerances oj* = and 77, = 0. Let x* be any limit point of the sequence 
{x£} and let JC be the infinite set of indices associated with that convergent subse- 
quence. Then, under the assumptions of Lemma 3.4, Parts 1, 2, and 3 of that lemma 
hold. 

Proof. Algorithm 2 generates positive scalars pk and, by Steps 1, 8, and 11, gener- 
ates positive scalars ujk — > and rjk — > 0. Step 2 of the algorithm generates a sequence 
{(4>2/fc>4)}> where Vk = Vk + Ay* k . Each (x* k , Ay* k , z* k ) satisfies (2.3). Therefore, the 
hypotheses of Lemma 3.4 hold, and Part 1 of the lemma follows immediately. 

Note that each x k satisfies (2.3a), and so x k > for all k. Thus, > 0. 
Moreover, because r CT > 1 and a is finite, Steps 6 and 10 of Algorithm 2 ensure that 
<7fc is uniformly bounded. We then need to consider the four possible cases. For all 

keJC, 

1. pk is uniformly bounded, and Ok — > as k gets large; 

2. pk is uniformly bounded, and Ok is uniformly bounded away from zero; 

3. pk — > 00 and o~k — > as k gets large; 

4. pk — > 00 and <7k is uniformly bounded away from zero. 
For the remainder of this proof, we consider only k e JC. 

We dismiss Case 1 because it cannot be generated by the algorithm. (As k gets 
large, — > only if Step 10 is executed infinitely many times, contradicting the 
finiteness of pk ) 

Case 2 implies that Step 6 of Algorithm 2 is executed for all k large enough. 
Thus, Xk+i = x k for all large k, and hence x k — > x» implies Xk — » x*. Therefore, 
\\ x l — x k\\ — * 0- Because each Ay k satisfies (2.3e), y k satisfies 



\\Uk - Vk\\oc <U k + (Jfe. 



(3.21) 
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Because u\. and ui k are uniformly bounded, Part 2 of Lemma 3.4 holds. In addition, 
||c(x^)|| < rjk for all k large enough, and so rjk — ► implies that c(x£) — > 0. By 
continuity of c, c* = 0. Thus, Part 3 of Lemma 3.4 holds. 

Now consider Case 3. Because Gk — > and — > 0, (3.21) implies that \\yl~ yk\\ — > 
as k increases. Then Part 2 of the lemma holds. To show that c(x* k ) — ► 0, divide 
both sides of (3.3) by pfe to obtain 

||c(4)ll < ^ + - vk\\ (M\\xt - Xk \\ + 1) + -x4 + hy k - y.\\ ■ 

^_Pk_^ Pk ^ / Pk Pk 

(a) (6) (c) ""(d) 

Term (a) clearly goes to zero as increases. Because y k and y^ satisfy (3.21), and 
because x£ and x k belong to the compact set B, (b) and (c) go to zero as pk increases. 
By Lemma 3.5, ||yfc||/pfc — > 0, and so (d) goes to 0. We conclude that ||c(;r£)|| — > as 
fc increases, as required. 

In Case 4, both Steps 6 and 10 are executed infinitely often. But because is 
uniformly bounded, so is \\y* k — yk\\- As in Case 2, \\x k — Xk\\ — > as fc get large, and 
so Part 2 of Lemma 3.4 holds. The rest of the analysis for this case is the same as for 
Case 3. □ 

3.1.4. Finite termination. Note that the convergence test takes place only if 
Step 3 of Algorithm 2 tests true; i.e., if ||c(x£)|| < i]k (because 77, = 0). To guarantee 
that the algorithm will eventually terminate as the iterates Xk, J/fe, and Zk converge, 
we need to guarantee that Steps 4 and 7 execute infinitely often. The forcing sequence 
i]k is intimately tied to this occurrence. For example, if r/k = 0, then we would not 
normally expect Step 3 to evaluate true (except in rare occasions when c{x k ) = 0). 
The forcing sequence defined by Steps 8 and 11 of Algorithm 2 is suggested by Conn 
ct al. [?, ?]. The following corollaries show that this forcing sequence has the desired 
property and summarize the global convergence properties of Algorithm 2. Unlike for 
the previous results in this section, we now need to strengthen our assumptions and 
require that only a single limit point exist. 

COROLLARY 3.7 (Global convergence). Let {{xk,yk, %k)} be the sequence of vec- 
tors generated by Algorithm 2. Let x* be the single limit point of the sequence {x^}. 
Suppose that Assumptions 1.1, 3.1 7 and 3.2 hold. Then 

lim (x k ,yk,Zk) = (x*,y*,z*), 

k^oc 

and (x*, y*, z#) is a first-order KKT point for (GNP). 

Proof. Let {(x k , y^, z k )} be the sequence of vectors generated by Step 2 of Algo- 
rithm 2 and set t/k — y{x%, y^, Pk)- By Lemma 3.4 and Theorem 3.6, 

lim yk — y* and lim z k = z*. 

k^co k^oo 

Moreover, (a;*, y*, z*) is a first-order KKT point for (GNP). Suppose Step 4 is executed 
infinitely often. The result then follows immediately because Xk, yk, and Zk are 
updated infinitely often and form a convergent sequence from x* k , yk, and z k . 

We now show by contradiction that Step 4 does occur infinitely often. Suppose 
instead that it does not. Then there exists a k\ large enough so that Steps 9 and 10 
are executed for all k > k\. Consider only iterations k > k\. Then yk = y and 
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Pk — * oo. From (3.16), 

Pk\\c(x* k )\\ = M-ykW 

= \\(Vk-v)+V-Vk\\ (3-22) 

< hi -y\\ + \\y\\ + \\yk\\- 

The vector y k = y k + Ay k and each Ay k satisfies (2.3e). Thus, \\y k — y\\ < a k + 
uj k . Moreover, lim^oo y k = y* and y* is bounded (see Assumption 3.2). Then, 
from (3.22), there exists some constant L > 0, independent of k, such that 

Pk \\c(x* k )\\ <L (3.23) 

for all k. But the test at Step 3 fails at every iteration, so that 

Vk < \\c(xt)\\. (3.24) 

Combining (3.23) and (3.24), we find that 

PkVk < Pk\\c(x* k )\\ < L. (3.25) 

From Step 11, r] k+1 = %/Pfc+i. so 

PkVk = Pk^ = VoPl~ a - (3-26) 
Pk 

Substituting (3.26) into (3.25), we find that Vopl~ a < L f° r au This is a contra- 
diction under the hypothesis that a < 1 and p k — > oo. Therefore, Step 4 must occur 
infinitely often. □ 

The following result simply asserts that Algorithm 2 will eventually exit when, as 
in practice, w* and 77* are positive. 

Corollary 3.8 (Finite Termination). Suppose that the convergence tolerances 
uj* and 77* are strictly positive. Then, under the assumptions of Corollary 3.7, Algo- 
rithm 2 terminates after a finite number of iterations. 

Proof. Let {{x* k ,y k , z k )} and be as defined in Theorem 3.6. Set y k = yk(x k ,y k , Pk)- 
By that theorem, 

lim y k = y* 

k^oo 

lim z* k = 2;* d =V x £(a;*,y*,0), 

k— >oc 

and (x*,y»,z») is a first-order KKT point for (GNP). Then, (a;*,y*,2:*) must sat- 
isfy (1.7). By the continuity of c, lim^oo ||c(x£)|| — > c* = 0, and because 77, > 0, 

ll c 04)ll < V* < max(?7 fe ,77*) 

for all k e /C large enough. Consequently, Step 7 is executed infinitely often and 

lim (x k ,y k , Zk) — > (x*,y*, z*). 

k^oc 

Because > and 77, > 0, {x k ,y k ,z k ) satisfies conditions (1.7) for some k large 
enough. □ 
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3.2. Local convergence properties. In this section we show that the sta- 
bilized LCL algorithm preserves the local convergence characteristics of Robinson's 
original LCL algorithm. Moreover, it can retain fast local convergence under inexact 
solutions to the subproblems. 

Bcrtsekas [?] and Conn et al. [?, ?] show how to construct a forcing sequence {rjk} 
to guarantee that ||c(x£)|| < rjk will eventually always be true so that the iterates Xk, 
yk, and Zk are updated (see Step 4 of Algorithm 2) for all iterations after some k large 
enough. The penalty parameter pk then remains uniformly bounded — an important 
property. These results rely on a relationship between ||c(x£)|| and pk, namely (3.3). 
We know from the BCL convergence theory that the convergence rate approaches 
super linear as pk grows large (cf. [?] and [?, ?]). Because rjk is reduced at a sublincar 
rate, ||c(x£)|| will eventually go to zero faster than rjk, at which point it is no longer 
necessary to increase pk- Thus, we can be assured that Algorithm 2 does not increase 
Pk without bound. 

Bcrtsekas suggests constructing the sequence rjk as 

??fc+1 = 7 || C (x*)||, (3.27) 
for some 7 < 1. Within Algorithm 2, this would lead to the following update rule: 



Pk+l — 




if || C (x*)|| < 7 ||c(x fe )|| 
if || C (x*)|| > 7 ||c(x fe )||. 



As pk gets larger, the convergence rate gets arbitrarily close to superlinear, so that the 
first case of (3.28) is always satisfied, and pk becomes constant for all k large enough. 
We prefer not to use rule (3.27) because it may be too strict. Any intermediate (and 
nonoptimal) iterate x* k could be feasible or nearly feasible for (GNP), so that ||c(x£)|| 
could be very small. Then rjk+i would be smaller than warranted on the following 
iteration. The forcing sequence suggested by Conn et al. [?, ?] does not suffer from 
this defect and has been proven by them to keep pk bounded. We have used this 
update in Algorithm 2 (see Steps 8 and 11). 

For this analysis and the remainder of this section, we assume that pk is uniformly 
bounded, so that Pk = P for all k greater than some k. Hence, we drop the subscript 
on pk and simply write p. We consider only iterations k > k. 

We begin by discussing the local convergence rates of the Algorithm 2 under the 
assumption that the elastic variables are always zero — that is, the linearized con- 
straints are always satisfied. Next, we show that after finitely many iterations the 
elastic penalty parameter Uk will always be large enough to guarantee that this as- 
sumption holds. In this way, we demonstrate that stabilized LCL becomes equivalent 
to MINOS (and to canonical LCL) as it approaches the solution. 

3.2.1. Convergence rates. Robinson's [?] local convergence analysis applies 
to the canonical LCL algorithm under the special case in which pk = (cf. (2.2)) 
and each subproblcm is solved to full accuracy (i.e., LOk = 0). He proved that one 
can expect fast convergence from a good enough starting point. In particular, under 
Assumptions 1.1, 1.5, and 3.2, we can expect an R-quadratic rate of convergence (see 
Ortega and Rhcinboldt [?] for an in-depth discussion of root-convergence rates). For 
a sufficiently good starting point, Robinson [?] proves that the subproblems (LCfc) 
are always feasible. He also shows that near a solution, the solutions to the LC 
subproblems, if parameterized appropriately, form a continuous path converging to 
(x* , y* , z*) . 
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In a later paper, Brauninger [?] shows how the fast local convergence rate can be 
preserved with only approximate solutions of the subproblems (again, with pk = 0). 
The subproblems are solved to a tolerance that is tightened at a rate that matches 
the decrease in the square of the primal and dual infeasibilities. Our proposed LCL 
algorithm uses a similar strategy. 

Robinson's local convergence analysis also applies to the canonical LCL algorithm 
when pk = p > 0. One can see this by considering the following optimization problem: 

minimize f(x) + ^p\\c(x)\\ 2 

X 

subject to c[x) =0, x > 

The solutions of (3.29) are identical to the solutions of (GNP). The Robinson LCL 
subproblem objective corresponding to (3.29) is given by 

R k (x) = f{x) + \p\\c(x)\\ 2 - V T k c(x). 

The canonical LCL subproblem objective is Ck(x) = C(x,yk, Pk), an d so Ck{x) = 
Rk (x) for all k because pk = p- We then observe that the canonical LCL subproblem 
corresponding to (GNP), with a penalty parameter pk = p, is equivalent to the Robin- 
son LCL subproblem corresponding to problem (3.29), with pk = 0. The convergence 
characteristics of the canonical LCL algorithm are therefore the same as those demon- 
strated by Robinson [?]. (However, while the asymptotic convergence rate remains 
R-quadratic, we expect a different asymptotic error constant.) 

Under the assumption that the elastic variables are always equal to and that p 
is finite, the steps executed by Algorithms 1 and 2 are identical, and the subproblems 
(ELCfe) and (LC&) are also identical. The only difference is the multiplier update 
formulas: 

Canonical LCL update Vk+i — Uk (3.30a) 
Stabilized LCL update Vk+i = Uk ~ P c ( x l)i (3.30b) 

which differ only by the vector pc{x* k ). We may think of this vector as a perturbation 
of the LCL multiplier update (3.30a). Moreover, Robinson [?] shows that this pertur- 
bation converges to at the same rate as {x k } converges to x*. Therefore, it does not 
interfere with the convergence rate of the stabilized LCL iterates. Robinson's local 
convergence analysis then applies to the stabilized LCL method. 

We summarize the convergence results in Theorem 3.9. Note that the function 

c(x) 

V x C(x,y,p) - z 
min(a;, z) 

captures the first-order optimality conditions of (GNP), in the sense that 

F(x»,y*,z*) = 

if and only if (a;* , y* , ) is a first-order KKT point for (GNP). Thus, \\F(x,y, z)\\ is 
a measure of the deviation from optimality. For the next theorem only, define 




and F(r) = F(x, y, z). 



(3.29) 



F{x,y,z) = 
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Theorem 3.9 (Robinson [?]; Brauninger [?]). Suppose Assumptions 1.1-1.5 
and 3.2 hold at r*. Moreover, suppose to k — 0(\\F(r k )\\ 2 ) for all k > 0. Then there 
is a positive constant 5 such that if 

\\r - r*|| < 6, 

the sequence {r k } generated by Algorithm 2 converges to r*. Moreover, the sequence 
converges R-quadratically, so that for all k > 0, 

\\r k -r*\\ < Q{\f k (3.31) 
for some positive constant Q. Also, 

||r fc+1 -r fc ||<M||F(r fc )|| (3.32) 

for some positive constant M . 

Robinson does not state (3.32) as part of a theorem, but it is found in the proof 
of (3.31). 

3.2.2. Asymptotic equivalence to MINOS. Much of the efficiency of LCL 
methods, including MINOS, derives from the fact that they eventually identify the 
correct active set, and each subproblem restricts its search to the subspace defined by 
a linear approximation of the constraints. This approximation can be very accurate 
near the solution. The stabilized LCL subproblems do not restrict themselves to this 
subspace. In early iterations we do not expect, nor do we wish, the method to honor 
these linearizations. The elastic variables give the subproblems an opportunity to 
deviate from this subspace. In order to recover LCL's fast convergence rate, however, 
it is not desirable to allow deviation near the solution. 

We show below that as the stabilized LCL iterations approach a solution of 
(GNP), the solutions of the stabilized LCL subproblems eventually always have the 
clastic variables equal to zero. Hence, c k {x k ) = and v k ,w k — 0, so that each x k 
satisfies the constraints of the canonical LCL (and MINOS) subproblem, and the 
objective of (ELCfc) at {x k ,v k ,w k ) is equivalent to (LCfe). 

As discussed in §3.2.1, Theorem 3.9 applies to Algorithm 2, and so for all k large 
enough, (3.32) yields 

\\y* k -y k \\<M\\F(x k ,y k ,z k )\\, (3.33) 

where y k = y k -\-Ay k and M is some positive constant. By Corollary 3.7, (x k , y k , z k ) — * 
(a;*,2/*, z*), and because ||F(x*, y», z*)|| = and F is continuous, 

\\F(x k ,y k ,z k )\\ < (3.34) 

for all k large enough (a is defined in Algorithm 2). Combining (3.33) and (3.34), we 
conclude that \\y k — y k \\ < a, or equivalently, 

||^||<2 (3.35) 

for all k large enough. However, Step 6 of Algorithm 2 guarantees that a < a k for all 
k, and so from (3.35), < a k for all k large enough. Lemma 2.1 then implies 

that a k will be sufficiently large that the optimal elastic variables will be equal to 0. 
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3.3. Infeasible problems. Not all optimization problems are feasible. The user 
of an optimization algorithm may formulate a set of nonlinear constraints c{x) = for 
which no nonnegative solution exists. Detecting infeasibility of the system c(x) = 0, 
x > 0, is equivalent to verifying that the global minimizer of 

minimize i||c(a;)|| 2 

2 (3.36) 
subject to x > 

yields a positive objective value. Detecting such infeasibility is a useful feature, but 
it is a very difficult problem and is beyond the purview of this paper. 

We analyze the properties of the stabilized LCL algorithm when it is applied 
to an infeasible problem with convergence tolerances w„ = 77* = 0. We show that 
Algorithm 2 converges to a point that satisfies the first-order optimality conditions of 
the minimum-norm problem (3.36). 

Theorem 3.10. Let x* be any limit point of the sequence of vectors {x k } gen- 
erated by Algorithm 2, and let JC be the infinite set of indices associated with that 
subsequence. Suppose that (GNP) is infeasible. Then, under the assumptions of 
Lemma 3.4, 

lim J{x* k ) T c{x%) = z* d = jjc*, 

kEK 

and (x*,z*) is a first- order KKT point for (3.36). 

Proof. The pair (a;*, 2*) satisfies the first-order KKT conditions of (3.36) if 

. , J * C \ = ^ (3-37) 
min(a;*,2:*) = 0. 

Because (GNP) is infeasible, there exists a constant 8 > such that 8 < ||c(x)|| 
for all x > 0. Moreover, Steps 8 and 11 of Algorithm 2 generate a sequence {rjk} 
converging to 0, and so rjk < 8 for all k large enough. Consider only such k. Then, 
rjk < 8 < \\c(xl)\\, and Step 10 is executed at every k, so that pu — > 00 and — > 0. 
Moreover, xu and yt are not updated, so that for some n-vector x and m-vector y, 

x k = x and y k = y. (3.38) 

Note that Algorithm 2 generates x* k satisfying (2.3). Therefore, x* k > for all k, 
and so lim feeK: x* k = x. t implies 

> 0. (3.39) 

From (2.3d), (3.7), and (3.38), 

g(x* k ) - J(xl) T {y - Pk c(xD) - J{x) T Ayl > -u k e, (3.40) 
or, after rearranging terms, 

g(x* k ) ~ J(x* k ) T y - J(x) T Ay* k +p k J{x* k ) T c^l) > ~u k e. (3.41) 

v J v J 

V V 

(a) (6) 



By hypothesis, all iterates x* k lie in a compact set, and so (a) is bounded because g 
and J are continuous and y is constant. Also, (b) is bounded because x, is constant, 
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and from (2.3e) we have llzAy^H^ < u k + ui k . Then, because oj k — > and p k — > oo, 
(3.41) implies that J(:r£) T c(a;£) > for all k large enough. Otherwise, (3.41) would 
eventually be violated as p k grew large. Then, 

z^ f lim J{xl) T c{xl) = Jjc* > 0. (3.42) 
All a;£ lie in a compact set, there exists some constant L > such that 

IK-*H<^> (3-43) 

where M and ai are as defined in Lemma 3.4 and n is the number of elements in the 
vector x* k . Substituting (3.43) into (3.12) and using (3.21), we have 

||.9(4) - J(x* k ) T y* k + Pk J(xl) T c(x* k )\\ < V^Wk + L(a k + u> k )}. (3.44) 
Dividing (3.44) through by p k , we obtain 
1 



(g(x* k ) - J(xl) T y* k ) + J{xl) T c{x* k ) 

Pk 



< V^{uJ k + L(<7 k +UJ k )} 

Pk 



The quantity g(x k ) — J{x k ) T y k is bounded for the same reasons that (a) and (b) above 
are bounded. Taking limits of both sides of (3.45), p k — > oo and uo k ,a k — > imply 
that J(4) T c(x£) — > 0. By continuity of J and c, J„ T c* = 0. Equivalently, we may 
write 

[Jjc*] 3 = if > 0, (3.46) 

for j = l,...,n. Therefore (3.39), (3.42) and (3.46) together imply that (x*,z*) 
satisfies conditions (3.37), as required. □ 

Theorem 3.10 describes a useful feature of Algorithm 2. When applied to an 
infeasible problem, the algorithm converges to a solution of (3.36) — or at least to a 
first-order point. One important caveat deserves mention: if the convergence tolerance 
77* is small (it usually will be), Algorithm 2 may never terminate. We need to insert 
an additional test to provide for the possibility that (GNP) is infeasible. For example, 
the test could force the algorithm to exit if p k is above a certain threshold value and 
||c(xji)|| is no longer decreasing. Any test we devise is necessarily heuristic, however; 
it is impossible to know for certain whether a larger value of p k would force ||c(x£)|| 
to be less than 77*. We discuss this point further in §4.6. 

3.4. Second-order optimality. The stabilized LCL method imposes few re- 
quirements on the manner in which the LC subproblems are solved. Our implemen- 
tation (see Section 4) uses MINOS or SNOPT to solve the LC subproblems. These 
are active-set solvers suitable for optimization problems with few expected degrees of 
freedom at the solution and in which only first derivatives are available. However, 
second derivatives might be readily available for some problems. Also, some problems 
are expected to have many degrees of freedom at the solution. In cither case, an 
interior-point solver (requiring second derivatives) may be more appropriate for the 
solution of the subproblems. 

Lemma 3.4 and Theorem 3.6 assert that iterates generated by the stabilized 
LCL algorithm converge to first-order KKT points. A subproblem solver that uses 
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second-derivatives may be able to guarantee convergence to second-order points. If 
we augment the convergence criteria for the solution of each subproblem to include 
second-order conditions, we can show that Algorithm 2 generates iterates converging 
to points satisfying the second-order sufficiency conditions for (GNP). The following 
assumption strengthens the first-order conditions (2.1). 

Assumption 3.11. Let x* be any limit point of the sequence {x k }, and let JC be 
the infinite set of indices associated with that convergent subsequence. For all k G JC 
large enough, the following conditions hold at each {x* k ,y k ,z k ): For some 5 > 0, 
independent of k, 

1. (Strict Complementarity) 

max(4,zj) > <5e; (3.47) 

2. (Second-Order Condition) For any p > 0, 

p T V 2 xx C{xi,yl,p)p>5 (3.48) 

for all p ^ satisfying 

J{x* k )p = and \p]j = for all j such that [x%]j = 0. (3.49) 

Condition (3.48) implies that the reduced Hessian of C is uniformly positive definite 
at all x* k . 

The following result extends Theorem 3.6 to consider the case in which iterates 
generated by Algorithm 2 satisfy Assumption 3.11. Conn et al. [?] show a similar 
result for their BCL method. 

Theorem 3.12. Suppose that Assumptions 1.1, 3.1, 3.2, and 3.11 hold. Let 

{(x k ,y k , z k )} be the sequences of vectors generated by Algorithm 2. Let x* be any 
limit point of the sequence {x k }, and let JC be the infinite set of indices associated 
with that convergent subsequence. Set y k = y{x* kl y k , pk). Then 

, lim .(4»2/fc>^) = {x*,y*,z*) (3.50) 

and (a;*, y*, z*) is an isolated local minimizer of (GNP). 

Proof. It follows immediately from Theorem 3.6 that (3.50) holds and that 
(x*,y*,z*) is a first-order KKT point for (GNP). It only remains to show that 
(x*,y*,z*) satisfies the second-order sufficiency conditions (see Definition 1.4). 

By hypothesis, x\ and z k satisfy Part 1 of Assumption 3.11 for all k e JC. There- 
fore, their limit points satisfy 

max(j;»,z») > Se > 0, 

and so X* and satisfy strict complementarity (Definition 1.3). We now show that 
x* and y* satisfy the second-order sufficiency conditions for (GNP). 

Let p be any nonzero vector satisfying (3.49) for all k € JC large enough. Then 

m 

p T Wl x C(x* k ,y* k , Pk )p = p T (H(x* k ) - Y}yk]iHi{xl))p (3.51) 

i=l 

for all k e JC large enough. Part 2 of Assumption 3.11 and (3.51) imply that 



m 

p T (H(x* k )~Y,^W x k))P>^ 
i=i 



(3.52) 
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where S is some positive constant. If we take the limit of (3.52), the continuity of H 
and Hi (see Assumption 1.1) and (3.50) imply that 

m 

p T Vl x £(x*,y*, p)p = P T (H(x«) -J2ly*]i H i( x *))P >5>0 (3.53) 

i=i 

for all p > and for all p ^ satisfying (1.9). Therefore, (a;*, t/*, z*) satisfies the 
second-order sufficiency conditions for (GNP), as required. □ 

4. Implementation. The practical implementation of an algorithm invariably 
requires many features that are not made explicit by its theory. In this section we dis- 
cuss some important details of our implementation of the stabilized LCL method. The 
algorithm has been implemented in Matlab, version 6 [?] and is called LCLOPT. It 
uses the Fortran codes MINOS [?, ?] and SNOPT [?] to solve the linearly constrained 
subproblcms. We now turn our attention back to the more general problem (NP), 
first presented in §1.1, and leave (GNP) behind. 

4.1. Problem formulation. LCLOPT does not solve (NP) directly, but rather 
solves the equivalent problem 



(NPi) minimize f(x) 



x,s 



subject lo -a = 0, l< r] <U. 



The formulation of problem (NPi) is chosen to match the problem formulation used 
by SNOPT. It is also closely related to that used by MINOS. As in those methods, 
our implementation distinguishes between variables in the vector x that appear and 
do not appear nonlincarly in the objective or the constraints; variables that appear 
only linearly are treated specially. The following discussion ignores this detail in order 
to keep the notation concise. 

The linearly constrained subproblcms corresponding to (NPi) take the form 



(ELCife) minimize Ck{x) + a^e (v + w) 



X,S.V,1V 



.ubjoc, to { c k + Mx-x k )+v- W ^_ s = ^ ,<Q< tt) 

< v, w. 



4.2. The main algorithm. The computational kernel of LCLOPT resides in 
the solution of each LC subproblem, and the efficiency of the implementation ulti- 
mately relies on the efficiency of the subproblem solver. The main tasks of the outer 
level are to form the subproblcms, update solution estimates, update parameters, and 
test for convergence or errors. 

4.3. Solving the LC subproblems. LCLOPT can use cither MINOS or SNOPT 
to solve (ELCife). For linearly constrained problems, MINOS uses a reduced-gradient 
method, coupled with a quasi-Newton approximation of the reduced Hessian of the 
the problem objective. SNOPT implements a sparse SQP method and maintains a 
limited-memory, quasi-Newton approximation of the Hessian of the problem objective. 
(In both cases, the problem objective will be the objective of (ELCifc).) For linearly 
constrained problems, SNOPT avoids performing an expensive Cholesky factorization 
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of the reduced Hessian for the quadratic programming subproblem in each of its own 
major iterations, and thus realizes considerable computational savings over problems 
with nonlinear constraints [?]. 

Both MINOS and SNOPT are available as libraries of Fortran 77 routines. We 
implemented MEX interfaces [?] written in C to make each of the routines from 
the MINOS and SNOPT libraries accessible from within Matlab. The subproblem 
solvers evaluate the nonlinear objective function (there are no nonlinear constraints 
in (ELCife)) through a generic MEX interface, f unObj . c. This routine makes calls to 
a Matlab routine to evaluate the nonlinear objective In turn, the routine for 
Ck makes calls to routines (available as Matlab or MEX routines) to evaluate the 
original nonlinear functions / and c. 

4.4. Computing an initial point. MINOS and SNOPT both ensure that all 
iterates remain feasible (to within a small tolerance) with respect to the bounds and 
linear constraints in (ELCife), which includes the bounds and linear constraints in 
(NPi). LCLOPT is therefore able to restrict the evaluation of the nonlinear functions 
/ and c to points in the latter region. A user of LCLOPT may thus introduce bounds 
and linear constraints into (NPi) to help guard against evaluation of the nonlinear 
functions at points where they are not defined. 

Before entering the first iteration of the stabilized LCL method, LCLOPT solves 
the following quadratic proximal-point (PP) problem: 



(PP2) minimize i||a; — x" 2 



2 ~ "T2 
X 

Ax 



subject to I < ( f ] < u, 



where x is a vector provided by the LCLOPT user. The solution of (PP2) is used 
as the initial point xq for the algorithm. The objective function of the PP problem 
helps find an xo reasonably close to x, while the constraints ensure that xo is feasible 
with respect to the bounds and linear constraints of (NPi). If (PP2) proves infcasiblc, 
(NPi) is declared infeasible and LCLOPT exits immediately with an error message. 
An alternative PP problem is based on the one-norm deviation from x: 



(PP1) minimize \\x — x\\i 



subject to I < {j\x) ~ U ' 



An advantage of (PP1) is that it can be reformulated and solved as a linear program, 
and its solution is therefore expected to lie on more constraint vertices. It is a cor- 
respondingly easier problem to solve for reduced-space solvers. SNOPT provides the 
option of solving either (PP1) or (PP2). LCLOPT can take advantage of this by 
initializing x = x and passing SNOPT the optimization problem 



minimize 

Ax 



subject to I < I . I < u, 



with the parameter Proximal Point set to either 1 or 2. (For MINOS, the constraints 
would have to be reformulated and a set of elastic variables introduced.) 
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The computational results presented in §5 were derived by using (PP2) to compute 
Xq. As suggested by Gill ct al. [?], a loose optimality tolerance on (PP2) is used to 
limit the computational expense of its solution: reducing the number of iterations and 
(typically) the number of superbasic variables. 

4.5. Early termination of the LC subproblems. The global convergence 
results for the stabilized LCL algorithm (cf. Lemma 3.4 and Theorem 3.6) assume 
that the optimality tolerances uj k for the subproblems converge to 0. This loose 
requirement allows much flexibility in constructing the sequence {ui k }. 

The solution estimates may be quite poor during early iterations. We expect slow 
progress during those iterations, even if they are solved to tight optimality tolerances. 
A loose tolerance may help limit the computational work performed by the subproblem 
solver during these early iterations. Near a solution, however, we wish to reduce the 
optimality tolerance quickly in order to take advantage of the fast local convergence 
rate predicted by Theorem 3.9. 

To construct the sequence {uj k }, we replace Step 1 of Algorithm 2 by 

u) ^min(o;fe, \\F(x k ,yk, Zk)\\lo) ^ ^ 

uj k+ i <— max(0.5ai,u;»), 

where w can be set by a user to any value between 0.5 and w*. The update (4.1) 
guarantees that uj k — > w*, as required. 

Following the prescription outlined in §2.2, we fix at a small value the feasibil- 
ity tolerance for satisfying the linearized constraints. The feasibility and optimality 
tolerances for each major iteration are passed to the subproblem solver as run-time 
parameters. 

4.6. Detecting infeasibility and unboundedness. As discussed in §3.3, Al- 
gorithm 2 will not exit if the optimization problem is infeasible and the infeasibility 
tolerance r?* is small. We declare (NPi) infeasible if at any given iteration k, x k is in- 
feasible with respect to the nonlinear constraints and the penalty parameter is greater 
than some threshold value. In particular, at Step 10, Algorithm 2 exits and (NPi) is 
declared infeasible if 

max(||[Z c - CfeJ+Hoo, \\[u c - c k ] + \\ x ) > r?, 

Pk > P, 

where l c and u c are the lower and upper bounds for the nonlinear constraints and 
[•]+ is the positive part of a vector. For the computational results in §5 the threshold 
value was set at p — 10 s . 

We also need to consider the possibility that (NPi) is unbounded- i.e., that the 
objective function / is unbounded below in the feasible region, or that ||x|| — ► oo. 
As with tests for infeasibility, any test for unboundedness must be ad hoc. We rely 
on the LC solver to help detect infeasibility. Problem (NPi) is declared unbounded 
and LCLOPT exits if the point x k is feasible and the LC solver reports (ELCifc) as 
unbounded. 

4.7. Summary of the stabilized LCL method. Following is a summary of 
the stabilized LCL method as implemented in LCLOPT. We assume that x is given 
and that the starting tolerances, u>q and 770, and parameters, po and cto, are set. 

1. Apply the LC solver to (PP1) or (PP2) to obtain a starting point xq that 
is feasible with respect to the bounds and linear constraints and reasonably 
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close to x. If the PP problem is infeasible, declare (NPi) infcasible and exit. 
Otherwise, set k = 0. 

2. Evaluate the functions and gradients at x k . Linearize the constraints and 
form (ELCifc). 

3. Apply the LC solver to (ELCifc) with optimality tolerance u k to obtain 
(x* k ,Ay* k ,z* k ).Sety* k =y k + Ay* k . 

4. If (ELCifc) is unbounded and x k is feasible, declare (NPi) unbounded and 
exit. If (ELCifc) is unbounded and x* k is infeasible, go to Step 8. Otherwise, 
continue. 

5. If x k meets the current nonlinear feasibility threshold r\ k , continue. Otherwise, 
go to Step 8. 

6. Update the solution estimate: {x k+ \,y k+ \,z k+ \) <— {x* k ,y k — p k c(x* k ),z k ). 
Keep the penalty parameter p k fixed and reset the elastic weight a k . 

7. Test convergence: If (x k +i, Vk+i, z k +i) satisfies the optimality conditions for 
(NPi), declare the current solution estimate optimal, return (x k +i , y k +i, z k +i), 
and exit. Otherwise, go to Step 9. 

8. If p k > p, declare (NPi) infeasible, return (x k , y k , z k ), and exit. Otherwise, 
discard the subproblem solution (i.e., (x k +i, yk+i, Zfc+i) <— {x k ,y k , z k)), in- 
crease the penalty parameter p k , and reduce the elastic weight a k . 

9. Set the next nonlinear feasibility threshold r) k+ i and LC subproblem optimal- 
ity tolerance cj k+ i, so that {(Lu k ,n k )} — > (a;*,??*). 

10. Set k <— k + 1. Return to Step 2. 

5. Numerical Results. This section summarizes the results of applying our im- 
plementation of the stabilized LCL method, LCLOPT, to a subset of nonlinearly con- 
strained test problems from the COPS 2.0 [?], Hock-Schittkowski [?], and CUTE [?] 
test suites. Two versions of LCLOPT are applied to each test problem: The first ver- 
sion uses AMPL/MINOS 5.5 [?], version 19981015, to solve the sequence of linearly 
constrained subproblcms; the second version uses SNOPT version 6.1-1(5). 

We used the AMPL versions of all problems, as formulated by Vanderbei [?]. 
A MEX interface to the AMPL libraries makes functions and gradients available in 
Matlab (see Gay [?] for details on interfacing external routines to AMPL). All runs 
were conducted on an AMD Athlon 1700XP using 384 MB of RAM, running Linux 
2.4.18. 

Figure 5 shows the performance profiles, as described by Dolan and More [?], of 
the two versions of LCLOPT (the dotted and dashed lines) and MINOS (the solid 
line). The three charts of that figure show performance profiles for the total num- 
ber of nonlinear function evaluations, minor iterations, and major iterations. All the 
problems selected from the COPS, Hock-Schittkowski, and CUTE test suites are in- 
cluded in each profile. The performance profiles describe the percentage of problems 
successfully solved (the vertical axes) within a factor r of the best-performing solver 
(the horizonal axes). 

By all measures, LCLOPT, using MINOS to solve the subproblcms, successfully 
solves the largest proportion of problems and proves to be the most reliable method. 
Compared with MINOS, LCLOPT tends to require more minor iterations (a measure 
of total computational work) but fewer major iterations to reach a solution. We 
comment further on this fact in §6.1. 

For all problems that can vary in the number of constraints and variables, we 
describe their dimensions. The following heads are used in Tables 5.1 and 5.4. 
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Fig. 5.1. Performance profiles. The vertical axes represent the percentage of problems suc- 
cessfully solved within a factor t of the best solver. The horizontal axes are based on a log scale. 
Performance profiles are shown for the number of nonlinear function evaluations, minor iterations, 
and major iterations. The profiles include the results of the 135 selected test problems. 
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Head Dimension 



m Constraints (linear and nonlinear) 

m c Nonlinear constraints 

n Variables 

n c Variables appearing nonlinearly in c 

rif Variables appearing nonlinearly in / 



5.1. Default parameters. Figure 5.2 shows the options files that LCLOPT 
uses for the LC solvers. These are fixed for all subproblcms. Separately, at each 
major iteration, LCLOPT sets the parameter Dptimality Tolerance in MINOS and 
the parameter Major Dptimality Tolerance in SNOPT. These are equivalent to 
the subproblem optimality tolerance u>k (§4.5 outlines the method for choosing this 
parameter) . 

Each test problem supplies a default starting point. This point is used as x in 
the proximal-point problem (see §4.4). The initial vector of multiplier estimates yo is 
set to zero. 

Both MINOS and SNOPT provide the option to reuse a quasi-Newton approxima- 
tion of a Hessian from a previous solve: MINOS approximates the reduced Hessian; 
SNOPT approximates the full Hessian. We take advantage of this feature for all 
iterations fc = 2, 3,4, ... by setting the MINOS and SNOPT options Start = 'Hot'. 

The parameters used by Algorithm 2 are set as follows. The upper and lower 
bounds of the elastic penalty parameters are a = 1 and a — 10 4 . The initial elastic 
weight is go = 10 2 . (Normally, LCLOPT scales this quantity by 1 + ||yo||oo, but the 
scaling has no effect for these test runs because yo = 0.) The penalty scaling factors 
are r p = 100 5 and T a = 10. As suggested in [?], we set a = 0.1 and (3 = 0.9. 
The initial penalty parameter is po = 10 5 / 2 /m c , where m c is the number of nonlinear 
constraints. The final optimality and feasibility tolerances are w» = 77* = 10~ 6 . The 
initial optimality and feasibility tolerances are w = 10~ 3 (= v 7 ^*) an d Vo = 1- 

In all cases, default options, with the exception of Major Iterations 500 and 
Superbasics Limit 2000, are used for the MINOS benchmarks. 



BEGIN LCL SUBPROBLEM 






Scale option 







Superbasics limit 




2000 


Iterations 




5000 


Feasibility tol 


1 


Oe-6 


END LCL SUBPROBLEM 






(a) The MINOS specs file 





BEGIN LCL SUBPROBLEM 






Scale option 







Superbasics limit 




2000 


Iterations 




5000 


Major iterations 




1000 


Minor iterations 




500 


Minor feasibility tol 


1 


0e-6 


Minor optimality tol 


2 


5e-7 


END LCL SUBPROBLEM 






(b) The SNOPT specs 


file 





Fig. 5.2. The fixed optional parameters for every subproblem solve. The optimality tolerance 
u>k is specified by LCLOPT for each fc. 



5.2. The COPS test problems. The COPS 2.0 collection [?] comprises 17 
problems. Five problems are excluded for the following reasons: 

• 3 problems are unconstrained: bearing, minsurf, and torsion; 

• 2 problems cause system errors when called using the AMPL MEX interface: 
glider and marine. 
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The dimensions of the COPS test problems can be adjusted. In all cases, the solvers 
were applied to the largest version of the problem (as specified by the AMPL model) 
that would not cause the system to age memory to disk. Table 5.1 summarizes the 
dimensions of the selected problems. 

Table 5.1 

Dimensions: The 12 selected COPS test problems 



Problem 


m 


m c 


n 


n c 


71/ 


camshape 


1604 


801 


800 


800 





catmix 


1603 


1600 


2403 


2403 





chain 


204 


1 


402 


201 


402 


channel 


800 


400 


800 


800 





elec 


201 


200 


600 


600 


600 


gasoil400 


4004 


3200 


4003 


4003 


202 


marine 


1208 


800 


1215 


1215 


344 


methanol 


2406 


1800 


2405 


1605 


1670 


pinene 


4006 


3000 


4005 


2405 


2469 


polygon 


1377 


1225 


100 


100 


100 


robot 


2414 


2400 


3611 


3209 





rocket 


2409 


1200 


1605 


1605 





steering 


2011 


1600 


2007 


1204 






As shown in Table 5.2, the version of LCLOPT using MINOS for the subproblems 
solved all 12 problems to first-order optimality. The version using SNOPT solved 11 
problems to first-order optimality; the exception was robot, which it declared as having 
infeasible nonlinear constraints. MINOS solved 10 of the 12 problems to optimality; it 
declared steering an infeasible problem, and it terminated the solution of elec because 
of excessive iterations. Feasible points exist for all of the test problems chosen, so we 
consider all declarations of infeasibility to be errors. 

Table 5.2 

Summary: The 12 selected COPS test problems 





LCLOPT 






(MINOS) 


(SNOPT) 


MINOS 


Optimal 


12 


11 


10 


False Infeasibility 




1 


1 


Terminated by iteration limit 






1 


Major iterations 


118 


179 


380 


Minor iterations 


53950 


147518 


61388 


Function evaluations 


53081 


11014 


63701 



We note that different local optima appear to have been found for problems 
camshape, methanol, polygon, and rocket. An excessive number of minor iterations 
were required by LCLOPT on catmix, elec, and robot with SNOPT as its subprob- 
lem solver. Especially during early major iterations, SNOPT was unable to solve the 
LC subproblems to the required optimality tolerance within the 5000 iteration limit. 
Rather than terminate with an error message, LCLOPT forces SNOPT to keep work- 
ing on the same subproblem until it returns a solution within the required optimality 
tolerance. In practice, a different strategy would be adopted, but our goal here is 
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to test the robustness of the outer iterations (the stabilized LCL method), not the 
robustness of the subproblem solvers. 

5.3. The Hock-Schittkowski test problems. The HS test suite contains 86 
nonlinearly constrained problems [?]. These are generally small and dense problems. 
We exclude 5 problems from this set for the following reasons: 

• 3 problems are not smooth: hs67, hs85, and hs87; 

• 2 problems require external functions: hs68 and hs69. 

Both versions of LCLOPT solved the same 80 problems to first-order optimal- 
ity, but both declared hsl09 infcasiblc. MINOS solved 80 problems to first-order 
optimality but declared hs93 infeasible. 



Table 5.3 

Summary: The 81 selected Hock-Schittkowski test problems 





LCLOPT 






(MINOS) 


(SNOPT) 


MINOS 


Optimal 


80 


80 


80 


False infeasibility 


1 


1 


1 


Major iterations 


654 


648 


1160 


Minor iterations 


7415 


25290 


10111 


Function evaluations 


12269 


14712 


27127 



On hsl3, all the solvers reached different solutions. However, the linear inde- 
pendence constraint qualification does not hold at the solution of this problem — this 
violates the required assumptions for both LCLOPT and MINOS. 

Recall that LCLOPT and MINOS use only first derivatives and hence may not 
necessarily converge to local solutions of a problem. For example, LCLOPT (in both 
versions) converged to a known local solution of hsl6, but MINOS converged to some 
other first-order point. In contrast, MINOS converged to the known local solutions 
of hs97 and hs98, while LCLOPT (in both versions) converged to other first-order 
points. Similar differences exist for problems hs47 and hs77. 



5.4. A selection of CUTE test problems. With the select utility [?], we 
extracted from the CUTE test suite dated September 7, 2000, problems with the 
following characteristics (* is a wild-card character): 



Objective function type 


* 


Constraint type 


Q (quadratic, general nonlinear) 


Regularity 


R (smooth) 


Degree of available derivatives 


1 (first derivatives, at least) 


Problem interest 


M R (modeling, real applications) 


Explicit internal variables 


* 


Number of variables 


* 


Number of constraints 


* 



These criteria yield 108 problems. We exclude 66 problems from this set for the 
following reasons: 



• 33 problems do not have AMPL versions: car2, c-reload, dembo7, drugdis, durgdise, 
errinbar, junkturn, leaknet, lubrif, mribasis, nystromS, orbitZ, reading^, readings, 
reading6, reading7, reading8, reading9, rotodisc, saromm, saro, tenbarsl, tenbars2, 
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tenbarsS, tenbars4, trigger, truspyrl, truspyr2, zamb2, zamb2-8, zamb2-9, zamb2- 
10, and zamb2-ll; 

• 21 problems cause system errors when evaluated either by the AMPL MEX interface 
or by MINOS (when invoked from AMPL): brainpc2, brainpc3, brainpc4, brainpcS, 
brainpc6, brainptf, brainpcS, brainpc9, bratu2dt, crescl32, csfil, csfi2, drcavllq, 
drcav2lq, drcav3lq, kissing, lakes, porousl, porous2, train], and trainh; 

• The AMPL versions of 12 problems are formulated with no nonlinear constraints: 
drcavtyl, drcavty2, drcavty3, flosp2hh, flosp2hl, flosp2hm, flosp2th, flosp2tl, 
flosp2tm, methanb8, methanW, and res. 

The dimensions of 17 of the remaining 42 problems can be adjusted. In all cases, 
the solvers were applied to the largest problem versions that would not cause the 
system to page memory to disk. Table 5.4 summarizes the dimensions of the selected 
problems that can vary in size. 



Table 5.4 

Dimensions of the variable-size CUTE test problems 



Problem 


m 


m c 


n 


n c 


n f 


bdvalue 


1000 


1000 


1000 


1000 





bratu2d 


4900 


4900 


4900 


4900 





bratu3d 


512 


512 


512 


512 





cbratu2d 


882 


882 


882 


882 





cbratu3d 


1024 


1024 


1024 


1024 





chandheq 


100 


100 


100 


100 





chemrcta 


2000 


1996 


2000 


1996 





chemrctb 


1000 


998 


1000 


998 





clnlbeam 


1001 


500 


1499 


499 


1000 


hadamard 


257 


128 


65 


64 


65 


manne 


731 


364 


1094 


364 


729 


readingl 


5001 


5000 


10001 


10000 


10000 


reading3 


103 


101 


202 


202 


202 


sreadin3 


5001 


5000 


10000 


9998 


9998 


ssnlbeam 


21 


10 


31 


11 


22 


svanberg 


1001 


1000 


1000 


1000 


1000 


ubh5 


14001 


2000 


19997 


6003 






The version of LCLOPT using MINOS solved 36 of 42 problems to first-order op- 
timality, while the version using SNOPT solved 34 problems to first-order optimality. 
MINOS solved 34 problems to first-order optimality. Table 5.5 summarizes these re- 
sults. We note that LCLOPT, in one of its two versions, solves every problem except 
heartd, which it declares infeasible. With the exception of cresc50, LCLOPT with 
SNOPT does not seem to suffer (on successful solves) from excessive minor iterations 
resulting from subproblem restarts, as it does on the COPS problems. 

6. Conclusions. The stabilized LCL method developed in this paper is a gen- 
eralization of the augmented Lagrangian methods discussed in §3 and it shares the 
strengths of its predecessors: it is globally convergent (the BCL advantage) and it 
has fast local convergence (the LCL advantage). The ^i-penalty function brings the 
two together. Because the stabilized LCL method operates in a reduced space given 
by the linearized constraints (like the LCL method), it does not suffer from the ill- 
conditioning effects that can plague BCL methods. 
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Table 5.5 

Summary: The ^2 selected CUTE test problems 



LCLOPT 





(MINOS) 


(SNOPT) 


MINOS 


Optimal 


36 


34 


34 


False infeasibility 


4 


3 


3 


Terminated by iteration limit 


1 


3 


1 


Terminated by superbasics limit 




1 




Unbounded/badly scaled 






3 


Final point cannot be improved 


1 


1 


1 


Major iterations 


400 


368 


1149 


Minor iterations 


70476 


278162 


29021 


Function evaluations 


59216 


57732 


53069 



6.1. Importance of early termination. The numerical results presented in 
§5 demonstrate that MINOS successfully solved many of the test problems using rela- 
tively few minor iterations. MINOS terminates its progress on each of its subproblcms 
after 40 iterations (to avoid a refactorization of the current basis, which by default 
occurs every 50 iterations). In contrast, LCLOPT attempts to constrain the subprob- 
lem iterations by means of an initially loose optimality tolerance (we set ojo = ^fUJZ 
for the runs shown in §5). A potential weakness of this approach vis a vis MINOS 
is that there is no a priori bound on the number of subproblcm iterations. MINOS's 
aggressive (and heuristic) strategy seems effective in keeping the total minor iteration 
counts low. This property is particularly important during the early major iterations, 
when the current solution estimates are poor. 

It may be possible to emulate the MINOS strategy and still satisfy the require- 
ment that the subproblem optimality tolerances u k converge to zero (cf. Lemma 3.4). 
For example, LCLOPT might truncate the subproblcm solutions after a fixed num- 
ber of iterations, and only gradually increase the iteration limit on successive major 
iterations. Especially during early major iterations, such a strategy may keep the 
accumulated number of subproblem iterations small. During later major iterations, 
the strategy would still ensure that the subproblem solver returns solutions within 
the prescribed tolerance u k . 

On the other end of the performance spectrum lies the issue of recovering LCL's 
fast local convergence rate under inexact solves (cf. §3.2.1). Brauninger [?] proves that 
the quadratic convergence rate of Robinson's method is retained when u k is reduced 
at a rate 0(\\F(x k ,y k , z k )\\ 2 ) (cf. Theorem 3.9). The first-order KKT conditions (2.1) 
for the LCL subproblem can be expressed as 

(Vl x L k {x k ) ^ + 0(||p||2) = (-g k + Jjy k ^ (fU) 

where p d =x — x k , and a first-order Taylor expansion was used to derive the residual 
term 0(||p|| 2 ). (We have ignored bound constraints for the moment. Robinson [?, ?] 
shows that the correct active set is identified by the subproblems near a solution.) 
The nonlinear equations (6.1) are closely related to the linear equations that would be 
derived from applying Newton's method to (2.1) (again, ignoring bound constraints). 
In that case, the theory from inexact Newton methods (Dembo et al. [?]) predicts 



32 



M. P. FRIEDLANDER AND M. A. SAUNDERS 



that the quadratic convergence rate is recovered when the residual error is reduced at 
the rate 0(\\F(xk, Uk, z k)\[)- The similarity between (6.1) and the Newton equations 
hints at the possibility of recovering the quadratic convergence rate of the LCL and 
stabilized LCL methods by reducing ujk at the rate 0(\\F(xk, yk, z k)\\)- We note, 
however, that stronger assumptions may be needed on the smoothness of the nonlinear 
functions. This issue deserves more study. 

6.2. Keeping the penalty parameter small. Preliminary experimentation 
reveals that a small penalty parameter pk can significantly reduce the difficulty of 
each subproblem solve. BCL methods require that pk be larger than some threshold 
value p. In contrast, LCL methods can converge when pk = if they are started near 
a solution (see §3.9). 

The challenge here is to find a strategy that can keep pk small or reduce it 
without destabilizing the method. A tentative strategy might be to reduce pk only 
finitely many times. This approach does not violate the hypotheses of Lemma 3.4, 
and may be effective in practice. A form of this strategy was used for the runs shown 
in §5. 

6.3. A second-derivative LC solver. We prove in §3.4 that the stabilized LCL 
method will converge to second-order stationary points if the LC subproblems are 
solved to second-order points (for example, by using a second-derivative LC solver). 
In practice, however, a second-derivative LC solver may be most useful as a means of 
reducing the overall computational work required by the stabilized LCL method. 

The stabilized LCL method is largely independent of the method in which its 
subproblems are solved. An LC solver using second derivatives is likely to require 
fewer iterations (and hence less computational work) for the solution of each of the 
subproblem. We would expect the number of required major iterations to remain 
constant if each subproblem solution is computed to within the prescribed tolerance 
ujk- However, we would expect to reduce the number of required major iterations if a 
MINOS-like strategy is used to terminate the subproblems (see §6.1). Over the same 
number of iterations, a subproblem solver using second derivatives may make more 
progress toward a solution than a first-derivative solver. 

Any future implementation of the stabilized LCL method would ideally be flexible 
enough to allow for a variety of solvers to be used for the LC subproblems. The choice 
of the subproblem solver could then be guided by the characteristics of the optimi- 
zation problem at hand. In particular, the advent of automatic differentiation makes 
second derivatives increasingly available for certain problem classes, e.g., within recent 
versions of GAMS and AMPL, and for more general functions defined by Fortran or C 
code, notably ADIFOR and ADIC (Bischof et al. [?, ?]). These may be used by SQP 
and interior methods for nonlinearly constrained (NC) problems (e.g., LOQO Van- 
dcrbci [?]). Certain theoretical challenges might be avoided, however, by developing 
specialized second-derivative LC solvers. Such LC solvers could be extended readily 
to general NC problems by incorporating them into the stabilized LCL algorithm. 
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